Search CORE

14 research outputs found

SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation

Author: Bazaga Adrián
Liò Pietro
Micklem Gos
Publication venue
Publication date: 26/10/2023
Field of study

In recent years, there has been growing interest in text-to-SQL translation, which is the task of converting natural language questions into executable SQL queries. This technology is important for its potential to democratize data extraction from databases. However, some of its key hurdles include domain generalisation, which is the ability to adapt to previously unseen databases, and alignment of natural language questions with the corresponding SQL queries. To overcome these challenges, we introduce SQLformer, a novel Transformer architecture specifically crafted to perform text-to-SQL translation tasks. Our model predicts SQL queries as abstract syntax trees (ASTs) in an autoregressive way, incorporating structural inductive bias in the encoder and decoder layers. This bias, guided by database table and column selection, aids the decoder in generating SQL query ASTs represented as graphs in a Breadth-First Search canonical order. Comprehensive experiments illustrate the state-of-the-art performance of SQLformer in the challenging text-to-SQL Spider benchmark. Our implementation is available at https://github.com/AdrianBZG/SQLformerComment: 11 pages, 4 figure

arXiv.org e-Print Archive

Unsupervised Fact Verification by Language Model Distillation

Author: Bazaga Adrián
Liò Pietro
Micklem Gos
Publication venue
Publication date: 28/09/2023
Field of study

Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation

arXiv.org e-Print Archive

A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies

Author: Badosa Carmen
Bazaga Adrián
Jiménez-Mallebrera Cecilia
Porta Josep M.
Roldán Mònica
Publication venue
Publication date: 30/01/2019
Field of study

The development of machine learning systems for the diagnosis of rare diseases is challenging mainly due the lack of data to study them. Despite this challenge, this paper proposes a system for the Computer Aided Diagnosis (CAD) of low-prevalence, congenital muscular dystrophies from confocal microscopy images. The proposed CAD system relies on a Convolutional Neural Network (CNN) which performs an independent classification for non-overlapping patches tiling the input image, and generates an overall decision summarizing the individual decisions for the patches on the query image. This decision scheme points to the possibly problematic areas in the input images and provides a global quantitative evaluation of the state of the patients, which is fundamental for diagnosis and to monitor the efficiency of therapies.Comment: Submitted for review to Expert Systems With Application

arXiv.org e-Print Archive

Digital.CSIC

Recommended from our members

Genome-wide investigation of gene-cancer associations for the prediction of novel therapeutic targets in oncology.

Author: Bazaga Adrián
Leggate Dan
Weisser Hendrik
Publication venue: Sci Rep
Publication date: 13/06/2020
Field of study

A major cause of failed drug discovery programs is suboptimal target selection, resulting in the development of drug candidates that are potent inhibitors, but ineffective at treating the disease. In the genomics era, the availability of large biomedical datasets with genome-wide readouts has the potential to transform target selection and validation. In this study we investigate how computational intelligence methods can be applied to predict novel therapeutic targets in oncology. We compared different machine learning classifiers applied to the task of drug target classification for nine different human cancer types. For each cancer type, a set of "known" target genes was obtained and equally-sized sets of "non-targets" were sampled multiple times from the human protein-coding genes. Models were trained on mutation, gene expression (TCGA), and gene essentiality (DepMap) data. In addition, we generated a numerical embedding of the interaction network of protein-coding genes using deep network representation learning and included the results in the modeling. We assessed feature importance using a random forests classifier and performed feature selection based on measuring permutation importance against a null distribution. Our best models achieved good generalization performance based on the AUROC metric. With the best model for each cancer type, we ran predictions on more than 15,000 protein-coding genes to identify potential novel targets. Our results indicate that this approach may be useful to inform early stages of the drug discovery pipeline.Innovate UK Knowledge Transfership Programme grant KTP01126

Apollo (Cambridge)

Translating synthetic natural language to database queries with a polyglot deep learning framework

Author: Bazaga Adrián
Gunwant Nupur
Micklem Gos
Publication venue: Scientific Reports
Publication date: 14/04/2021
Field of study

Abstract: The number of databases as well as their size and complexity is increasing. This creates a barrier to use especially for non-experts, who have to come to grips with the nature of the data, the way it has been represented in the database, and the specific query languages or user interfaces by which data are accessed. These difficulties worsen in research settings, where it is common to work with many different databases. One approach to improving this situation is to allow users to pose their queries in natural language. In this work we describe a machine learning framework, Polyglotter, that in a general way supports the mapping of natural language searches to database queries. Importantly, it does not require the creation of manually annotated data for training and therefore can be applied easily to multiple domains. The framework is polyglot in the sense that it supports multiple different database engines that are accessed with a variety of query languages, including SQL and Cypher. Furthermore Polyglotter supports multi-class queries. Good performance is achieved on both toy and real databases, as well as a human-annotated WikiSQL query set. Thus Polyglotter may help database maintainers make their resources more accessible

arXiv.org e-Print Archive

PubMed Central

Apollo (Cambridge)

A Convolutional Neural Network for the Automatic Diagnosis of Collagen VI related Muscular Dystrophies

Author: Bazaga Adrián
Roldán Mònica
Badosa Carmen
Jiménez-Mallebrera Cecilia
Porta Josep M.
Publication venue
Publication date: 01/01/2019
Field of study

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Synthesis and characterization of M(II) phosphonates (M = Fe, Co, Zn, Mn) as precursors for PEMFCs electrocatalysts

Author: Bazaga García Montse
Cabeza-Diaz Aurelio
Losilla Enrique R.
López-Vergara Adrián
Olivera-Pastor Pascual
Porras Vázquez José Manuel
Ruiz Salcedo Inés
Publication venue
Publication date: 15/11/2018
Field of study

Metal phosphonates are promising precursors for applications such as proton conductivity [1] and catalysis [2]. Specifically, upon calcination metal polyphosphates are generated that can be used as non-noble metal alternatives [3] to the highly expensive commercial catalysts (Pt) for proton exchange membrane fuel cells (PEMFCs). In this work, we present the synthesis and characterization of metal polyphosphates obtained from transition divalent metal phosphonates (M= Fe, Mn, Co and Zn) both as monometallic and bimetallic systems (solid solutions). For the preparation of the metal phosphonate precursors, two types of organic linkers were selected, i.e. 2-R,S-hydroxiphosphonoacetic acid [HO3PCH(OH)COOH, HPAA] and nitrilotrismethylenephosphonic acid [N(CH2PO3H2)3, ATMP]. The as synthesized compounds were calcined between 700 and 1000 ºC under N2. Depending on the metal/phosphorous molar ratio in the precursor phases, different compositions were found, the corresponding metal pyrophosphate being the major component according to the crystallographic data. Interestingly, in most of cases the solid solutions were preserved in the final product, for instance Fe-Mn, Fe-Co and Fe-Zn. All calcined materials have been also characterized by XPS, SEM/EDS, FTIR-Raman.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga

Transition metal hydroxyphosphonoacetates as precursors of electrocatalysts

Author: Bazaga-García Montse
Cabeza-Diaz Aurelio
López-Vergara Adrián
Olivera-Pastor Pascual
Pérez Colodrero Rosario Mercedes
Ruiz Salcedo Inés
Vilchez Cózar Álvaro
Publication venue
Publication date: 29/01/2021
Field of study

Contribución a CongresosCoordination polymers (PCs) are widely studied due to their applicability in many fields. Among them, metal phosphonates are attractive materials due to their great structural and functional diversity, as proton conductors and/or precursors of electrocatalysts, alternative to the high-cost commercial catalysts based on noble metals, for both, PEMFCs and electrolytic systems. In this research-work, we report the synthesis, characterization and electrochemical properties of several coordination polymers derived from (R,S)-2-hydroxyphosphonoacetic acid (HPAA) with transition metals (MII = Fe, Co, Mn, Ni) as well as their solid solutions. The precursor PCs decompose, upon heating in different conditions, to the corresponding metal oxalate solid solutions, which are then used as intermediate materials for obtaining new Non-Precious Metal Electrocatalysts (NPMCs), by pyrolytic treatment at different temperatures under N2/H2 atmospheres. The electrochemical behavior of these compounds, regarding to the Oxygen Evolution and Reduction Reactions (OER and ORR, respectively), show that the structural features are of considerable importance as to their electrocatalytic activities

Repositorio Institucional Universidad de Málaga

Genome-wide investigation of gene-cancer associations using machine learning on biomedical big data

Author: Rodríguez Bazaga Adrián
Publication venue: Universitat Politècnica de Catalunya
Publication date: 05/07/2019
Field of study

UPCommons. Portal del coneixement obert de la UPC

Genome-wide investigation of gene-cancer associations using machine learning on biomedical big data

Author: Rodríguez Bazaga Adrián
Publication venue: Universitat Politècnica de Catalunya
Publication date
Field of study

RECERCAT